AITopics | Coles County

Collaborating Authors

Coles County

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Rita, Mathieu, Strub, Florian, Chaabouni, Rahma, Michel, Paul, Dupoux, Emmanuel, Pietquin, Olivier

arXiv.org Artificial IntelligenceApr-30-2024

While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyperparameter tuning. Additionally, KL regularization focuses solely on regularizing the language policy, neglecting a potential source of regularization: the reward function itself. Inspired by demonstration-guided RL, we here introduce the Reward Calibration from Demonstration (RCfD), which leverages human demonstrations and a reward model to recalibrate the reward objective. Formally, given a prompt, the RCfD objective minimizes the distance between the demonstrations' and LLM's rewards rather than directly maximizing the reward function. This objective shift avoids incentivizing the LLM to exploit the reward model and promotes more natural and diverse language generation. We show the effectiveness of RCfD on three language tasks, which achieves comparable performance to carefully tuned baselines while mitigating ROO.

demonstration, proc, rcfd, (15 more...)

arXiv.org Artificial Intelligence

2404.19409

Country:

North America > United States > Missouri > Cole County (0.05)
Asia > India > Maharashtra (0.05)
North America > United States > Illinois > Coles County > Charleston (0.05)
(4 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Separability, Contextuality, and the Quantum Frame Problem

Fields, Chris, Glazebrook, James F.

arXiv.org Artificial IntelligenceApr-19-2023

We study the relationship between assumptions of state separability and both preparation and measurement contextuality, and the relationship of both of these to the frame problem, the problem of predicting what does not change in consequence of an action. We state a quantum analog of the latter and prove its undecidability. We show how contextuality is generically induced in state preparation and measurement by basis choice, thermodynamic exchange, and the imposition of a priori causal models, and how fine-tuning assumptions appear ubiquitously in settings characterized as non-contextual.

artificial intelligence, contextuality, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10773-023-05406-9

2304.1001

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(12 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Gravilon: Applications of a New Gradient Descent Method to Machine Learning

Kelterborn, Chad, Mazur, Marcin, Petrenko, Bogdan V.

arXiv.org Machine LearningOct-28-2020

Gradient descent algorithms have been used in countless applications since the inception of Newton's method. The explosion in the number of applications of neural networks has re-energized efforts in recent years to improve the standard gradient descent method in both efficiency and accuracy. These methods modify the effect of the gradient in updating the values of the parameters. These modifications often incorporate hyperparameters: additional variables whose values must be specified at the outset of the program. We provide, below, a novel gradient descent algorithm, called Gravilon, that uses the geometry of the hypersurface to modify the length of the step in the direction of the gradient. Using neural networks, we provide promising experimental results comparing the accuracy and efficiency of the Gravilon method against commonly used gradient descent algorithms on MNIST digit classification.

artificial intelligence, machine learning, neural network, (14 more...)

arXiv.org Machine Learning

2008.1137

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > North Carolina > Orange County > Chapel Hill (0.14)
North America > United States > New York > Broome County > Binghamton (0.04)
North America > United States > Illinois > Coles County > Charleston (0.04)

Genre: Workflow (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback